HKUST statistical machine translation experiments for IWSLT 2007
نویسندگان
چکیده
This paper describes the HKUST experiments in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.
منابع مشابه
The MIT-LL/AFRL IWSLT-2007 MT system
The MIT-LL/AFRL MT system implements a standard phrase-based, statistical translation model. It incorporates a number of extensions that improve performance for speechbased translation. During this evaluation our efforts focused on the rapid porting of our SMT system to a new language (Arabic) and novel approaches to translation from speech input. This paper discusses the architecture of the MI...
متن کاملToward integrating word sense and entity disambiguation into statistical machine translation
We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST’s...
متن کاملThe CASIA phrase-based statistical machine translation system for IWSLT 2007
This paper describes our phrase-based statistical machine translation system (CASIA) used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007. In this year's evaluation, we participated in the open data track of clean text for the Chinese-to-English machine translation. Here, we mainly introduce the overview of the system, the primary modules, th...
متن کاملThe University of Edinburgh system description for IWSLT 2007
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinc...
متن کاملLarger feature set approach for machine translation in IWSLT 2007
The NTT Statistical Machine Translation System employs a large number of feature functions. First, k-best translation candidates are generated by an efficient decoding method of hierarchical phrase-based translation. Second, the k-best translations are reranked. In both steps, sparse binary features — of the order of millions — are integrated during the search. This paper gives the details of t...
متن کامل